home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Language/OS - Multiplatform Resource Library
/
LANGUAGE OS.iso
/
lisp
/
elk-2_0.lha
/
elk-2.0
/
doc
/
paper
/
paper.tr
< prev
next >
Wrap
Text File
|
1992-10-15
|
69KB
|
1,589 lines
.\" .fp 1 PR
.\" .fp 2 PI
.\" .fp 3 PB
.fp 5 C
.de P
.br
.ne 3
.sp .4
.LP
.ft 2
..
. \" Scheme code start
.de Ss
.KS
.ta 8.5c
.ft 5
.ps -2
.vs -2
.nf
.in 1c
.sp .3c
..
. \" Scheme code end
.de Se
.ft
.ps
.vs
.fi
.in
.KE
.sp .5
..
. \" Newline in Scheme code
.de Sl
.sp .5
..
.nr lS 0 1
. \" Listing start
.de Ls
.br
.KF
.sp .5
.LP
\l'\\n(.lu_'
..
. \" Listing caption: .Lc caption
.de Lc
.sp .2
.ce 999
\f3\s-1Listing \\n+(lS:\fP \c
\\$1\s0
.if !\(bs\\$2\(bs\(bs \s-1\&\\$2\s0
.ce 0
..
. \" Listing end: .Le reference
.de Le
.tm s/@L(\\$1)/\\n(lS/g
.LP
\l'\\n(.lu_'
.sp
.KE
..
. \" Notes start (at end of listing)
.de Ns
.sp
.ps -2
.vs -2
.in 1c
.ll -1c
..
. \" Notes end
.de Ne
.ll
.in
.vs
.ps
.sp -.3
..
.ds E Elk
.TL
Elk: The Extension Language Kit
.AU
Oliver Laumann, Carsten Bormann
.AI
Technische Universit\*at Berlin, Germany
.AB
In the past, users generally were at the mercy of the authors of an
application when it came to adapting it to their individual needs and
tastes.
Fitting an application with an \f2extension language\fP (or \f2embedded
language\fP) enables users to customize and enhance it without having to
modify its source code.
Recently, variants of the programming language Lisp have become increasingly
popular for this purpose, to the point where the abundance of different
dialects has become a problem.
Of the two standardized dialects of Lisp, only \f2Scheme\fP is suitably
modest, yet sufficiently general, to serve as an extension language.
.PP
\f2Elk\fP, the \f2Extension Language Kit\fP, is a Scheme implementation
that is intended to be used as a general, reusable extension language
subsystem for integration into existing and future applications.
Applications can define their own Scheme data types and primitives, which
provide for a tightly-knit integration of the C/C++ parts of the application
with Scheme code.
Library interfaces such as those to various X Window System libraries
show the effectiveness of this approach.
Various features of Elk such as dynamic loading of object files and
freezing of fully customized applications into executables
(implemented in those UNIX environments where it was feasible)
increase its usability as the backbone of a complex application.
Elk has been used as such for nearly five years within a
locally-developed ODA-based multimedia document editor; it has been
used in numerous other projects after it could be made freely
available three years ago.
.AE
.NH
Introduction
.PP
The designers and implementors of a large or complex application can
rarely anticipate all requirements future users will have on the
application.
Typically, users wish to be able to customize the user-interfaces
of applications according to their personal tastes or requirements,
or they have the desire to extend the functionality of an application
(either by combining existing functions into new ones or by adding
entirely new capabilities).
This is especially true for daily-used applications such as text
editors and for applications with a high degree of user-interaction
or with complex graphical user-interfaces.
.PP
Any application can certainly be customized by modifying its source
code and recompiling it.
But this approach is often not feasible, as the source code of the
application or the tools needed to recompile it may not be available.
Even if it were feasible, it would be a time-consuming process;
it would be hard to keep up with new releases of the application;
and the coexistence of multiple, similar versions of the same
application would be a general maintenance headache.
.PP
The alternative to this approach is to not ``hard-wire'' the entire
functionality and all external aspects of an application in the source
code at all, but to provide means to customize the application's
behavior later by it's users.
.P
Early Customization and Extension Languages
.PP
Many applications support at least simple methods for customization,
such as command line options or configuration files.
More powerful tools for customization are \f2macro languages\fP,
\f2command languages\fP, or \f2scripting languages\fP that are
typically found in text editors and word processors.
Prominent examples of such customization and extension languages
are the macro language of the now legendary TECO editor and,
in UNIX, the macro language of the \f2troff\fP text formatter
[Ossanna 1979] and the configuration language of the \f2sendmail\fP
program.
.PP
Although many of these classic extension languages are quite
powerful (some of them are full-fledged programming languages),
they have a reputation of being ``cryptic'' and hard to understand
and use by untrained users.
The prevailing opinion seems to be that only experts can actually
benefit from these types of extension languages (for example,
people who have mastered the \f2sendmail\fP configuration language
in all details are commonly appointed the status of a ``guru'').
In fact, it can be observed that only very few users of the \f2troff\fP
text formatter (whose macro language is reputed to be particularly
cryptic) are using macro packages written by themselves; many users
give up after some time and fall back on vendor-supplied macro
packages or packages written by a ``troff-guru''.
.PP
Experience also indicates that simplified or specialized extension
languages often grow and have more features added until they resemble
a full programming language.
Such ``organically grown'' extension languages are likely to be
contorted designs as they will consist of several levels of extensions
glued on to their initial, more limited design.
.P
High-Level Extension Languages
.PP
Recently application designers have begun to abandon
specialized and cryptic macro-style extension languages in
favor of extension languages that resemble usual high-level
programming languages, mainly languages with Algol/Pascal-style
or Lisp-style syntax and semantics.
Prominent examples of such high-level extension languages are
TPU developed by DEC, the \f2Ness\fP language of the Andrew
Toolkit [Hansen 1990], AutoDesk's CAD extension language (a dialect
of Lisp), and \f2Emacs-Lisp\fP, the extension
language of Richard Stallman's popular GNU Emacs editor
[Stallman 1981, Lewis et al. 1990].
.PP
Emacs was the first wide-spread application to employ an
already existing and widely used high-level programming
language as its extension and customization language.
Emacs-Lisp is a dynamically-scoped dialect of Lisp with additional
functionality for text-editing oriented operations.
The approach taken by Emacs has been tremendously successful;
during the last years users of Emacs have contributed a wealth of
extensions written in Emacs-Lisp.
.P
Elk as a General, Reusable Extension Language
.PP
Using Lisp or Lisp-style languages as extension languages
seems to enjoy growing popularity; several applications besides
Emacs now use dialects of Lisp as their extension language.
This development has one disadvantage: the number of coexisting
incompatible (but similar) extension languages is continually growing.
Users have to learn a new language for each new application,
and application writers keep implementing new extension language
interpreters instead of reusing existing ones.
.PP
These problems can be solved by making available a general,
reusable extension language implementation that application
writers can include into their applications.
The main objective of the \f2Elk\fP project was to develop such a
generic extension language and make it freely available to
encourage application writers to use it in their applications.
.NH
Overview of the Extension Language Kit
.P
The Evolution of Elk
.PP
The development of \*E began when we were searching for a suitable
extension language implementation for ISOTEXT [Bormann et. al. 1988,
Bormann 1991],
an ODA-based document processing system with a graphical user-interface.
ISOTEXT is almost entirely written in C++; its user-interface is
based on the X window system [Scheifler et al. 1986]
and the OSF/Motif widget set.
Customizability and extensibility by means of a full extension language
was a basic requirement on the design of ISOTEXT.
.PP
As we considered language design to be the domain of a
``selected few'', we decided to use an existing programming
language as the basis for the extension language of ISOTEXT.
This decision was also influenced by our desire to develop a
general, reusable extension language implementation that is
not hard-wired into one specific application.
For a number of reasons an interpreted language seemed preferable:
it should be possible to add extensions to or modify extensions in
a running application without re-linking it;
bugs in extensions should not crash the application;
interpreted languages usually offer better debugging facilities;
and implementing an interpreter generally requires less
efforts than implementing a compiler.
.PP
From the beginning we favored Lisp or a dialect of Lisp
as the basis for a general extension language.
Most dialects of the Lisp family are ``small'', easy to implement,
general-purpose languages with simple syntax and powerful semantics,
and the suitability of Lisp as an extension language had already been
demonstrated by several applications, among them GNU Emacs.
Early in the project we considered to use Emacs-Lisp,
but it appeared infeasible to isolate the Lisp-interpreter
from the rest of Emacs.
In addition, at the time we investigated Emacs-Lisp it was lacking
several desirable language features, such as support for floating point
and arbitrary precision numbers (\f2bignums\fP).
We also considered to use MIT Scheme [MIT 1984], but due to the enormous
size of its implementation it would have dominated the size of
the application.
.P
Scheme as an Extension Language
.PP
As other implementations of Lisp or Lisp-like languages available at
the time of our investigations did not meet our requirements, we
finally decided to write an interpreter for the Lisp dialect Scheme
[Clinger et al. 1991, Dybvig 1987, Springer et al. 1989,
Abelson et al. 1985].
This Scheme interpreter is the main component of the \*E package.
Scheme is a simplified, ``cleaned-up'' dialect of Lisp with
first-class procedures and static scoping rules.
The Scheme language is based on only a few language features and
semantic concepts; it consists of a small core if syntactic
forms, a set of extended forms derived from them, and a number
of standard procedures (\f2primitive\fP procedures) that operate
on a rich set of types of objects (among them numbers, lists, vectors,
symbols, characters, and strings).
In 1991 Scheme became an IEEE standard [IEEE\|Std\|1178-1990]
(the standard document, although only 50 pages long,
includes the complete formal semantics of the language).
.PP
The standardization effort has increased the acceptance of Scheme;
for instance, the Extension Language Working Group
of the CAD Framework Initiative has recently selected Scheme as the
extension language for future CAD applications [CFI 1991a, CFI 1991b].
Among the established programming languages we consider Scheme the
ideal candidate for a general extension language \-
it is standardized; its semantics are
well-defined; it has a simple syntax and is easy to implement; and it
is sufficiently small to not dwarf the application to be extended.
.P
Extending the Extension Language
.PP
The implementation of an extension language must itself be
extensible.
To enable code written in such a language to manipulate
objects or state of the application to be extended,
the language's base set of primitive procedures and data types
must be augmented by application-specific primitives and types.
In fact, easy extensibility of the language has been the primary
design consideration in the development of \*E (as opposed to
performance or number of language features).
To allow \*E programs to be expressive in the context of a given
application, application writers are encouraged (and expected)
to extend the language base of \*E by a rich set of
application-specific data types and Scheme primitives to operate on
objects of these types.
Adding new types and primitives to \*E is an inexpensive operation;
it is not uncommon for an application to define hundreds
of application-specific Scheme primitives.
.\"
.\" implementation must fit well to `host language' (schreibt cabo...)
.PP
All primitive procedures of \*E are C or C++ functions.
This is true for both built-in primitives (such as \f2car\fP and \f2cdr\fP)
and primitives defined by extensions.
From the Scheme programmers' point of view, primitives and types from the
base set of the language are indistinguishable from application-specific
primitives and types.
Extensions ``register'' new primitives with the interpreter
by supplying the name of the primitive along with a pointer
to the function implementing the primitive and other information.
New types are defined in a similar way.
Registration of new primitives and types typically takes place at the
time the interpreter is invoked or when a compiled extension is loaded
into the running interpreter.
.PP
Another way to use the extension mechanisms of \*E is to provide
interfaces to libraries, such as the C library or the libraries
of the X window system (e.\|g.\& \f2Xlib\fP).
\*E has no facility to directly import ``foreign'' functions
(although such a facility could be written as an extension;
in fact, this has been done at a company where \*E is used).
Therefore, a small amount of code acting as ``glue'' between \*E and
the library has to be written to make the contents of a library
available to Scheme programmers.
The main purpose of this interface code is to check the arguments
supplied to the library functions, to convert Scheme objects
into C types, and to convert the results of library functions back
into Scheme objects.
Library extensions often act as glue between the application
to be extended and the libraries used by the application; they allow
the application writers to abstract from the details of the
libraries.
Although it is useful to distinguish between \f2library\fP extensions
and extensions interfacing to \f2applications\fP, there is no
technical difference \- in both cases a collection of types
and functions is made available to the Scheme world.
.P
The Contents of the Extension Language Kit
.PP
The \*E distribution consists of the Scheme interpreter (the
\f2interpreter kernel\fP) and a number of library extensions.
These provide interfaces to the X11 ``Xlib'' (similar to
``CLX'' [CLX 1991] in its functionality, but implemented on top
of Xlib), to the X11 toolkit intrinsics (``Xt'')
and the Athena and OSF/Motif widget sets, and to a small number
of functions of the C library.
The X extensions are especially useful for application writers
whose applications have graphical user-interfaces based on X;
\*E enables them to write their user-interfaces or parts thereof
in Scheme to achieve maximum customizability.
.PP
\*E can also be used as a free-standing Scheme implementation.
In combination with the X extensions it is well-suited as a tool
for interactively exploring X, for teaching X to beginners, and
as a platform for rapid prototyping of X-based applications.
.NH
Using Elk in Applications
.P
Bringing Everything Together
.PP
In contrast to other extension language implementations
(e.\|g.\& TCL [Ousterhout 1990]),
\*E does not provide its functionality in the form of a library that is
statically linked into an application to be extended.
Instead, the object modules comprising the application
and all required library extensions are dynamically linked with and
loaded into the running Scheme interpreter.
To accomplish this, the \f2load\fP primitive of \*E has been
extended to load object files \- compiled extensions written
in C or C++ \- besides files containing Scheme code.
Dynamic loading enables applications to load less frequently
used modules into the running program on demand; such an
application is initially smaller than the equivalent statically
linked application (where all modules must be combined into
one large executable file).
.PP
Dynamic loading of object files is often used together with the
\f2dump\fP primitive that creates an executable file from
the running interpreter, similar to \f2unexec\fP of GNU Emacs or
\f2dump\%lisp\fP in some Lisp systems.
The \f2dump\fP primitive of \*E differs from existing, similar
mechanisms in that the newly created executable, when called, starts at
the point where \f2dump\fP was called in the original invocation (as
opposed to the program's \f2main\fP entry point).
Here the return value of \f2dump\fP is ``true'', while in the original
invocation it returns ``false'' (not unlike the UNIX \f2fork\fP system
call).
.P
Dynamic Loading and Dump in Cooperation
.PP
To generate a new instance of an application one would typically
invoke the Scheme interpreter, load all object modules and all Scheme
code required initially, perform all initializations that can survive a
``dump'', and finally dump an image of the running interpreter
containing all the loaded code into a new executable on disk.
The use of \f2dump\fP avoids time-consuming activities like
loading of object files and other initializations on each startup.
The dumped executable, when started, resumes after the call to
\f2dump\fP; at this point one would perform the remaining,
environment-dependent initializations and finally invokes the
application's ``main program'' (e.\|g.\& enter the X toolkit's event
processing main loop).
Listing @L(dump) shows a (slightly simplified) Scheme program that
generates and starts a new instance of an application.
.Ls
.Ss
;;; Load initially required object files and Scheme files of
;;; application and dump image into executable file.
;;; Dumped file enters application's main loop on startup.
.Sl
(load 'main.o) ; initial object modules
(load 'edit.o)
(load 'x11.o) ; (a library extension)
\&...
(load 'ui.scm) ; initial Scheme files
(load 'custom.scm)
(load 'x11.scm)
\&...
(initialize-application)
.Sl
(if (dump 'a.out)
(begin ; dumped a.out starts execution here
(initialize-some-more)
(main-loop-of-application)
(exit)))
.Sl
;; Original invocation gets here when dump is finished. We're done.
.Se
.Lc "Scheme code to generate and start an application"
.Ns
\f2Note:\fP Filenames can be given as symbols (besides the usual string
literals).
A more meaningful name than a.out would probably be chosen in practice.
.Ne
.Le dump
.PP
On systems that do not support dynamic linking and loading of
object files (such as older versions of UNIX System V)
or where \f2dump\fP cannot be implemented,
the interpreter kernel and the application and library extensions
are linked statically and combined into one executable.
.PP
In any case, in an application using \*E, the control initially
rests in the Scheme interpreter.
The interpreter acts as ``main program'' of the application; it is the
interpreter's \f2main()\fP function which is invoked on startup of
the program.
Therefore the first code to execute in an application is Scheme code;
this Scheme code provides the shell functionality of the application
(it is hence called \f2shell code\fP).
The shell code may perform a few simple tasks, for instance, load a
user-provided initialization file containing customization code for
the application and then enter the application's main loop,
or it may be as complex as in ISOTEXT, where the entire X-based
user-interface is written in Scheme.
.P
Making Oneself Known to the Extension Language
.PP
The application, as it is linked with the extension language
interpreter, has full access to all external functions and variables of
the interpreter kernel.
The interpreter, on the other hand, does not have any knowledge of the
contents of dynamically linked and loaded object modules; all it
sees of an object file being loaded is the file's symbol table.
To obtain ``hooks'' into a newly loaded extension, the interpreter
searches the symbol table of each object file being loaded for
functions whose names start with the prefix ``init_''
(\f2extension initialization functions\fP) and invokes these functions
as they are encountered.
Likewise, to support extensions written in C++, any C++ static
constructors found in the symbol table are called.
When linked statically, the interpreter must scan its own symbol
table on startup to find and invoke the initializations functions.
.PP
Besides initializing private data of the modules being loaded,
these initialization functions register with the interpreter
the Scheme primitives and Scheme data types implemented by the extensions.
To enable extensions to register new primitive procedures and types,
the interpreter kernel exports two functions: \f2Define_Primitive()\fP
to register a new Scheme primitive and \f2Define_Type()\fP to
register a new type.
.PP
The arguments supplied to \f2Define_Primitive()\fP are a pointer to the
function implementing the primitive procedure, the Scheme name of the
primitive, the minimum and maximum number of arguments, and
a symbol indicating the \f2calling discipline\fP of the primitive.
Calling disciplines are: normal procedure with fixed number of
arguments, such as \f2car\fP; procedure with variable argument list,
such as \f2append\fP; and \f2special form\fP (variable number of
unevaluated arguments).
\f2Define_Type\fP is invoked with the Scheme name of the type, the size
of the type's representation in C or C++, two functions implementing
the \f2eqv?\fP and \f2equal?\fP predicates for objects of this type, a
function that is called by the interpreter to print an object of the
new type (the type's \f2print function\fP), and a function providing
information about the type to the garbage collector.
The return value of \f2Define_Type()\fP is a ``handle'' to the newly
defined type (basically a small, unique integer); its main uses are to
check the type of arguments supplied to primitive procedures and to
instantiate objects of this type.
.NH
Notes on the Implementation
.P
Implementing Continuations
.PP
Finding a way to efficiently implement Scheme's \f2continuations\fP
called for considerable efforts during the design phase of \*E.
Continuations are a powerful language feature; they support the
definition of arbitrary control structures such as non-local
loop and procedure exits, \f2break\fP and \f2return\fP as in C,
exception handling facilities, explicit backtracking, co-routines,
or multitasking based on \f2engines\fP.
.PP
The primitive procedure
.Ss
\s+1(call-with-current-continuation \f2receiver\fP)\s0
.Se
packages up the current execution state of the program into
an object (the \f2continuation\fP or \f2escape procedure\fP)
and passes this object as an argument to \f2receiver\fP (which is
a procedure of one argument).
Continuations are first-class objects in Scheme; they are
represented as procedures of one argument (not to be confused
with the \f2receiver\fP procedure).
Each time a continuation procedure is called with a value,
it causes this value to be returned as the result of the
\f2call-with-current-continuation\fP expression which created this
continuation.
If the procedure \f2receiver\fP terminates normally (i.\|e.\& does
not invoke the continuation given to it), the value
returned by \f2call-with-current-continuation\fP is the return
value of \f2receiver\fP.
.PP
As long as the use of a continuation is confined to the runtime
of the \f2receiver\fP procedure, \f2call-with-current-continuation\fP
is similar in its functionality to \f2catch/throw\fP in most
Lisp dialects or \f2setjmp/longjmp\fP in C.
However, continuations, like all procedures in Scheme, have
indefinite extent; they can be stored in variables and called
an arbitrary number of times, even after the \f2receiver\fP and
the enclosing \f2call-with-current-continuation\fP have already
terminated.
Listing @L(call-cc) shows a program fragment where continuations
are used to get back an arbitrary number of times into the middle
of an expression whose computation has already been completed.
While not particularly useful, this example demonstrates that
continuations can be used to build control structures that
cannot be implemented by means of less general language features like
catch/throw or setjmp/longjmp in C.
.Ls
.Ss
(define function
(lambda (n m)
(+ n (mark m))) ; return n+m
.Sl
(define get-back "uninitialized")
.Sl
(define mark ; identity function, but also
(lambda (value) ; assign current continuation
(call-with-current-continuation ; to a global variable
(lambda (continuation)
(set! get-back continuation)
value))))
.Sl
.Sl
(function 10 20) \z\(-> 30 ; invoke function
(get-back 5) \z\(-> 15 ; resume with new value
(get-back 0) \z\(-> 10 ; ...once more
.Se
.Lc "Using continuations with unlimited extent"
.Le call-cc
.PP
The different approaches that can be used in implementing
continuations are intimately tied to the strategies used for
interpreting the language itself.
Scheme interpreters generally employ a lexical analyzer and parser
\- the \f2reader\fP \- to read and parse the Scheme source code and
produce an intermediate representation of the program.
During this phase, symbols are collected in a global hash table
(in Lisp jargon, the symbols are \f2interned\fP), and a tree
structure representing the program's S-expressions is built up
on the heap of the interpreter.
The majority of interpreters compile this intermediate representation
into an abstract machine language (such as \f2byte code\fP) in a
second pass (in practice, these passes may be combined in one pass).
The evaluator is then implemented as an abstract machine which interprets
the low-level language; this machine \- usually a simple stack
machine \- may even be implemented in hardware.
.PP
In an abstract machine implementation, the obvious approach to implement
\f2call-with-current-continuation\fP is to package up the contents of
the registers (program counter, stack pointer, etc.) and the current
runtime stack of the abstract machine.
As continuations have indefinite extent, it would not suffice to just
capture the abstract machine's registers (as the C library function
\f2setjmp\fP does for the real machine).
To be able to continue the evaluation of procedures that have
already returned and whose frames are therefore no longer on the stack,
a continuation must also embody the contents of the abstract
machine's stack at the time it is created.
When a continuation is applied, the machine resumes the ``frozen''
computation by restoring the saved registers and stack contents
of the abstract machine.
.PP
This technique would not work in \*E, because at the time a
continuation is created, arbitrary library functions may be
active in addition to Scheme primitives.
For instance, consider the extension interfacing \*E to the
``Xt'' toolkit intrinsics of the X window system.
When using this extension, a typical scenario is that some Scheme
procedure invokes the primitive that enters the toolkit's event
dispatching main loop (\f2XtAppMainLoop()\fP).
When an event arrives (for example, a mouse button press event),
the toolkit's main loop invokes a callback function, which in turn
calls a user-supplied Scheme procedure to be executed when a
mouse button is pressed.
This Scheme procedure might in turn invoke yet another function
from the ``Xt'' library, and so on.
A similar example would be a \f2qsort\fP or \f2ftw\fP extension to \*E,
where the user-supplied function called by the \f2qsort()\fP or
\f2ftw()\fP C library function would invoke a procedure written
in Scheme.
.PP
The interpreter's thread of execution at any time apparently involves
both Scheme primitives and library functions (such as
\f2XtAppMainLoop()\fP and \f2qsort()\fP in the examples above) in an
arbitrary combination.
Therefore, a continuation must not only embody the execution
state of the active Scheme procedures, but also that of the
currently active library functions (such as local variables
used by the library functions).
In the approach followed by \*E, a continuation is created by capturing
the machine's registers \- like \f2setjmp\fP in C does \- and the C
runtime stack.
When a continuation is applied later, the registers and the saved
stack contents are copied back.
Actually, we did not follow the usual ``abstract machine''
technique in \*E at all; instead, the Scheme evaluator directly
interprets the intermediate representation produced by the reader.
In a sense, it is the ``real'' machine (the hardware on which \*E
is executed) that plays the role of the abstract machine in
implementations with byte-code compilation.
.PP
Although the abstract machine technique usually yields faster
execution of Scheme code, the performance of \*E is comparable to
that of existing interpreters employing the abstract machine approach,
and the implementation of \*E is less complex than that of comparable
interpreters using byte-code compilation.
While the technique to implement continuations in \*E is not
portable \- it is based on certain assumptions on the machine's stack
layout and the C compiler and runtime environment \-
implementations of the small machine-dependent part now exist for
most major machine architectures.
.P
The Implementation of ``dump''
.PP
Continuations provide a natural basis for implementing the
execution-state preserving semantics of the \f2dump\fP primitive.
When called, \f2dump\fP invokes \f2call-with-current-continuation\fP
(actually, since it is written in C, the interpreter's internal version
of \f2call-with-current-continuation\fP).
The real work is done in the \f2receiver\fP procedure; it stores the
newly created continuation into a global variable, creates an
executable file from the image of the running process, sets a
global \f2was-dumped\fP flag to indicate that a dump has taken place,
and finally returns ``false''.
The return value of the \f2dump\fP primitive is the return value
of this call to \f2call-with-current-continuation\fP, i.\|e.\&
``false'' if a dump has just been performed.
.PP
When the interpreter \- either the original program or a dumped
executable \- is started, it examines the \f2was-dumped\fP flag
as its very first action.
If the flag is set, the running interpreter was started from a
dumped executable.
In this case the interpreter immediately invokes with an argument of
``true'' the continuation that was saved away by a call to \f2dump\fP;
this causes that call to \f2dump\fP to finish and return ``true'' to
its caller.
If, on the other hand, the \f2was-dumped\fP flag is not set (i.\|e.\&
the running process was not started from a dumped image), the
interpreter initializes and starts up as usual.
.PP
Before writing an image of the running process to disk, \f2dump\fP
has to close all open Scheme file ports, as open file descriptors would
not survive a \f2dump\fP \- they would no longer be valid in the
dumped executable.
Generally, this is true for all objects pointing to information
maintained by the UNIX kernel, such as the current directory, the
current signal dispositions, resource limits, or interval timers.
Users and implementors of \*E extensions must be aware of this
particular restriction.
For instance, users of the X11 extensions have to make sure that,
if \f2dump\fP is to be used, connections to X-displays are only
established in the dumped invocation.
.PP
To be able create an executable from the running process, \f2dump\fP
has to open and read the a.out file from which the running process was
started (actually, if the system linker has been called to dynamically
load object files, the output of the most recent invocation of the
linker is used instead of the original a.out).
The symbol table of the new executable is copied from the a.out file of
the running program; in addition, the a.out header has to be read to
obtain the length of the text segment and the start of the data segment
of the running process.
To do so, \f2dump\fP has to determine the filename of a.out file from
which the process was started based on the information in \f2argv[0]\fP
and in the PATH environment variable.
This approach is obviously based on several prerequisites: \f2dump\fP
must be able to access its a.out file (\f2argv[0]\fP must carry
meaningful information; the file must be readable) and the running
program's a.out file must not have been stripped.
It would have been advantageous for the implementation of \f2dump\fP
if the entire a.out file had been automatically mapped into memory
on startup, like it is done, for instance, in NEXT-OS/Mach.
.PP
\f2dump\fP combines the data segment and the ``bss'' segment of the
running process into the data segment of the new executable.
If \*E had a separate heap for storing constant objects (future
versions may have one),
\f2dump\fP would place this read-only part of the memory into the new
executable's text segment to make it sharable.
When the interpreter's heap is written to disk, \f2dump\fP seeks
over the unused portions of the heap, so that fake blocks can be
used for these parts of the file.
This results in a considerable conservation of disk space in
the final executable, as at least half of the interpreter's
heap is unused at any time due to the garbage collection
algorithm of \*E.
.PP
Since the a.out formats used in the numerous
versions of UNIX differ vastly, \*E has to include separate
implementations of \f2dump\fP for the currently supported
a.out formats.
Version 1.5 of \*E handles the BSD-style a.out format used
in BSD and ``derived'' UNIX versions (such as SunOS 4.1),
the COFF a.out format (used in older releases of UNIX System V
and in A/UX),
Extended COFF of MIPS-based computers running Ultrix, and
the ELF a.out format of System V Release 4 and related UNIX
versions (Solaris 2.0).
.\"
.\" stack alignment problem (alloca in main)
.P
Dynamic Loading of Object Files
.PP
When loading an object file during runtime, addresses
within this object file must be relocated to its new location
in the program's address space.
To allow extensions to directly reference objects of the interpreter
kernel, such as the heap and the built-in primitives, unresolved
references into the \f2base program\fP must be resolved during
dynamic loading.
Finally, the object file needs to be able to export its entry points
(such as \*E's extension initialization functions) to the base program.
.PP
More than one object file may have to be loaded into one invocation
of \*E.
To manage non-trivial, hierarchically structured sets of extensions,
where a number of high-level extensions require one or more lower-level
extensions to be loaded, it is essential that object files loaded later
can make use of the symbols defined by previously loaded object files.
As this style of dynamic loading allows building complex systems from
small components incrementally, we will use the term \f2incremental
loading\fP.
.PP
With the advent of 4.0\|BSD in 1980 [Joy 1980],
support for incremental
loading was added to the system linker and has since been supported by
most major UNIX variants:
when the -A option and the name of the base executable are supplied to the
linker, linking is performed in a way that the object file produced by
the linker can be read into the already running executable.
The symbol table of the resulting object file is a combination of the
symbols defined by the base program and the newly defined symbols added
by the linking process, from the object file or from libraries used in
linking.
Only this newly linked code and data is entered into the
resulting object file.
The incremental style of dynamic loading is achieved by saving
the resulting output file each time the linker is invoked and using
this file as the base program for the next incremental loading step,
such that both old and new symbols can be referenced.
.PP
Incremental loading is generally supported by the linkers of UNIX
versions that use the BSD-style a.out format and by those of several
UNIX systems based on more modern a.out formats (e.\|g.\& Ultrix).
It is not supported by any existing release of UNIX System V.
Some newer UNIX versions that have shared libraries and dynamic linking
(such as System V Release 4 or SunOS) offer a library interface to
the dynamic linker.
In some systems this kind of interface is intended to replace the
incremental loading functionality of the system linker.
These dynamic linker interfaces usually come in the form of a library that
exports functions such as \f2dlopen()\fP to map a shared object module or
shared library into the address space of the caller (the base program)
and \f2dlsym()\fP to obtain the address of a function or data item in
the newly attached object module.
.PP
In some implementations, object files attached through \f2dlopen()\fP may
directly reference symbols in the base program; in other implementations
they may not.
In any case, object files cannot directly reference symbols defined
by objects that have been placed into the program by previous calls
to \f2dlopen()\fP (only, if at all, indirectly by calling \f2dlsym()\fP).
Thus, these dynamic linker interfaces are clearly inferior to
incremental loading, as they lack the important capability to
load a set of object files \f2incrementally\fP.
Vendors who have replaced ``/bin/ld -A'' by a \f2dlopen\fP-style library
in their UNIX systems, or who intend to do so, do not seem to be
aware of the fact that this change will break applications that
rely on incremental loading.
.PP
For \*E, the consequence of being restricted to dynamic linker
interfaces of that kind is that, except for the simplest applications,
one must pre-link all possible combinations of extensions that are
not completely independent of each other.
In general, given a set of \f2n\fP extensions each of which can be
based on one out of \f2m\fP other extensions, this means having to prepare
and keep around \f2n\|\(mu\|m\fP pre-linked object files; not to
mention the contortions one has to go through when the hierarchy of
extensions has a depth greater than two (not an unlikely scenario in
practice).
If the number of extensions and relations between them is larger than
trivial, or if the extensions are large or require large libraries,
keeping around all pre-linked combinations of object modules will
waste a considerable amount of disk space.
.PP
Another, although minor, problem with these dynamic linker interfaces
is that they usually offer only a simple-minded function (such as
\f2dlsym()\fP) to look up the address of a specific symbol of a newly
accessed object module (typically some kind of module initialization
function); but they do not provide a way to scan all newly defined
symbols.
This functionality is insufficient to implement extension
initialization in \*E, where a dynamically loadable extension often is
composed from a number of small modules, each defining its own
initialization function.
Requiring a single, common initialization function name for the entire
object file implies that (often configuration-dependent) ``glue code''
must be added to call all the individual initialization functions,
including the C++ static constructors.
.P
Non-Standard Language Features
.PP
As the current version of the Scheme standard (deliberately) does not
specify several important language issues, such as error handling or
syntactic extensions, we have added a number of non-standard language
features to the Scheme interpreter of \*E to fill some of the holes.
.PP
A proposal for a macro extension has only recently been
added as an addendum to the \f2Revised\u\s-1\|4\s0\d Report on the
Algorithmic Language Scheme\fP [Clinger et al. 1991] and is still being
discussed controversially within the Scheme community.
To avoid having to wait for a final version of a macro system to
evolve and be included in the Scheme standard, we implemented a
simple-minded macro mechanism in \*E that resembles the macro
facilities offered by various existing Scheme and Lisp systems.
.PP
One area where the Scheme standard does not specify any language
features yet is error and exception handling; the standard merely
states which error situations a conforming implementation is
required to detect and report.
Since it is essential for a non-trivial application to be able to
gracefully handle error situations (such as failures in interactions
with the operating system) and other exceptional conditions, we have
added a simple error and exception handling facility to \*E.
.PP
When an error is detected by the interpreter, a user-supplied
error handling procedure is invoked with arguments identifying the
type and source of the error.
The standard top-level of \*E provides a default error handler that
prints an error message and then resumes the main read-eval-print loop
by means of a \f2reset\fP primitive.
Most primitives of \*E and the extensions use this error handling
facility to signal an error, as opposed to indicating failure by
a distinctive return value (which would be prone to being ignored).
To by-pass the standard error handler and ``catch'' failure of
a particular primitive, programs may enclose the call to the
primitive by \f2call-with-current-continuation\fP and use
\f2fluid-let\fP to dynamically bind the error handler to the
continuation (as shown in listing @L(errset)).
.Ls
.Ss
(define (new-open-input-file name)
(call-with-current-continuation
(lambda (return)
(fluid-let ((error-handler
(lambda args (return #f))))
(open-input-file name)))))
.Se
.Lc "A version of open-input-file that returns the newly opened port \
on success, #f on error"
.Le errset
.PP
\*E provides a similar facility to handle an \f2interrupt\fP exception:
a user-supplied interrupt handler is invoked when a SIGINT signal is sent
to the interpreter (usually by typing the interrupt character on the
keyboard).
Support for other exceptions, such as timer interrupts, may be provided
in future versions.
.PP
Another non-standard primitive that facilitates handling of errors is
\f2dynamic-wind\fP, a generalization of the \f2unwind-protect\fP form
offered by many Lisp dialects.
\f2dynamic-wind\fP is used to implement the \f2fluid-let\fP special
form (to create \f2fluid\fP or dynamic variable bindings).
Both \f2dynamic-wind\fP and \f2fluid-let\fP are also provided by
several other Scheme dialects [MIT 1984, Dybvig 1987].
.PP
The current version of the Scheme standard does not provide any
language features that would make it possible to implement a useful
Scheme debugger (apart from a debugger based on source code
instrumentation).
To compensate this shortcoming, we have added a few primitives that
aid the implementation of a simple interactive debugger, among them an
\f2eval\fP primitive (although, theoretically, \f2eval\fP could be
implemented by writing an expression into a temporary file and then
loading this file).
In addition, \*E, like a few other Scheme dialects, provides lexical
environments as first class (but immutable) objects.
Other non-standard primitives that are helpful in debugging are
\f2procedure-lambda\fP to obtain the lambda expression that evaluated
to a given procedure, and a primitive that returns the list of
currently active procedures together with their actual arguments and
the lexical environments in which the procedure calls took place
(a \f2backtrace\fP).
.\"
.\" provide, require; autoloading
.P
Garbage Collection
.PP
The garbage collector of \*E is based on the \f2stop-and-copy\fP
algorithm; a description of this algorithm can be found, among
other places, in [Abelson et al. 1985].
An incremental, generational garbage collector for \*E has recently
been adapted from Yip's garbage collector [Yip 1991] and is now being
integrated into \*E as an alternative to the stop-and-copy garbage
collector.
.PP
Extensions to \*E can register \f2before-GC\fP and \f2after-GC\fP
functions with the interpreter; these functions are invoked by the
garbage collector immediately before and after each garbage collection
run.
Within \f2after-GC\fP functions, extension can determine whether or not
a particular Scheme object has been moved by the preceding garbage
collection run.
An object has not been moved by the garbage collector if no references
to the object exist any longer, i.\|e.\& if it has become garbage.
In this case, an extension may perform some kind of clean-up action;
for example, if the now unreferenced object contains a handle to an open
file, close this file.
.PP
The \*E distribution contains a library based on this mechanism that
enables extensions to register a \f2termination function\fP for
objects of a particular type.
The termination function associated with an object is then invoked
by the garbage collector automatically when this object has been
detected to be unused.
The Xlib extension of \*E uses this library to perform suitable
finalization operations on objects created by the extensions, for
example, close windows, unload fonts, and free colormap objects that
have become unreferenced.
This mechanism is slightly complicated by the fact that objects may
have to be terminated in a predefined order; for instance, when an
X11 display becomes garbage, all objects associated with this
display must be terminated before the display itself is finally closed.
.NH
Practical Experiences with Elk
.P
Elk and ISOTEXT
.PP
In developing the ODA-based document processing system ISOTEXT, Elk
proved to be a major asset [Bormann 1991].
Scheme was used as the implementation language for all user interface
aspects of ISOTEXT.
Apart from providing extensibility to users of ISOTEXT, using Elk as
the base for ISOTEXT made it possible to write the shell code in a high
level language with all its amenities, e.\|g.\& automatic storage
reclamation.
As no recompilation and relinking is necessary, it is a quick operation
to apply and test changes to the user interface.
.PP
Elk provides for a strong ``firewall'' in the ISOTEXT system:
bugs in the Scheme code give rise to errors at the Scheme level, which can
easily be debugged using the (primitive, but functional) built-in
debugger of Elk, while conditions such as core dumps always are the
result of bugs in the ISOTEXT kernel implementation.
.PP
All this assistance for the development of ISOTEXT could be
achieved without sacrificing the performance of the ISOTEXT kernel
system, which is still written in efficient C++.
.PP
Elk also allowed to isolate the ISOTEXT kernel from the choice of an
X toolkit:
The ISOTEXT kernel is unaware of the toolkit being used (``Xt'' with
OSF/Motif).
The Scheme code builds a user interface using the Motif library
interface and provides X windows to the ISOTEXT kernel.
Input is processed by the Scheme code which calls editor primitives
provided by the ISOTEXT kernel and schedules redisplay operations.
Replacing Xt and OSF/Motif by e.\|g.\& \f2Xview\fP would require no
changes in the ISOTEXT kernel.
.PP
During the development of ISOTEXT it turned out that the extension
writer's interface of Elk could benefit from a number of improvements.
The main problem is that it is difficult to write non-trivial
extensions, because too much of the inner workings of the interpreter
is exposed to and must be dealt with by the extension writer.
.PP
In particular, as Elk can trigger a garbage collection run at any
time a chunk of heap space is requested, extensions must register
local or temporary Scheme object with the garbage collector to protect
them from being discarded during a subsequent GC run.
While this scheme has the advantage that maximum utilization of the
available heap space is guaranteed, it imposes a strict discipline
on the extension programmer.
Failure to properly protect temporary Scheme objects usually results
in delayed crashes of the application that hard to trace back to the
actual source of the problem.
In fact, when developing the X11 extensions to Elk, most of the time
spent for debugging was due to GC-related bugs.
.\"
.\"(cabo: recruiting problems)
.P
Library Extensions
.PP
The problems we encountered when designing
and implementing Elk's interfaces to the C libraries of X11
are likely to be applicable to a wide range of similar APIs.
The X11 libraries, especially Xlib, are quite complex; the core Xlib
alone exports more than 600 functions and macros, within which
numerous different mechanisms are employed for passing arguments and
for manipulating objects, some of which are perceived to be rather
verbose and error-prone by many programmers.
This complexity is, at least partly, caused by the semantic
restrictiveness of the C programming language.
Thus, when designing the Scheme language interface, we had the
opportunity to eliminate some of the ``warts''.
.PP
If integration of a library with an extension language (or interactive
language in general) is not anticipated at the time the programmer's
interface of the library is designed, writing a properly functioning
extension language interface to this library can become quite difficult
or even impossible.
This problem is exemplified by the ``Xt'' toolkit intrinsics library
of X11, in particular by earlier versions of this library.
The following example illustrates a typical difficulty caused by
the ``static'' nature of the programmer's interface to ``Xt'':
.PP
Each class of graphical objects (\f2widgets\fP in ``Xt'' terminology)
exports a list of attributes (\f2resources\fP) that are associated with
objects of this class.
A function is provided by ``Xt'' to obtain the list of resources of a
widget class together with the name and C type (integer, string,
pixmap, color, etc.) of each resource.
On this basis, operations like setting the value of a widget's resource
from within Scheme can be implemented in a straightforward way.
The ``Xt'' extension just has to check if the user-supplied Scheme
value can be converted into a C object of the resource's type, perform
this conversion, and call the Xt-function to set the resource, or
complain to the user if the value is not suitable for this resource.
However, until recently, some classes of widgets had a subset of
resources (the \f2constraint resources\fP) whose names and types
could not be obtained by an ``Xt'' application.
While this omission is usually not perceived as a problem for C
programmers (who would know each widget's resources \f2a priori\fP from
reading the documentation), it had a dramatic effect on Elk's ``Xt''
extension, as now the knowledge about these resources had to be
hard-wired into the extension.
As a result, the extension's source code had to be modified for each
new widget set to be made usable from within Scheme code.
.PP
This particular problem has been remedied in recent releases of X11,
though several similar problems remain; even in the UNIX C library.
While design flaws of library interface often go unnoticed or are
considered minor when writing C or C++ programs (e.\|g.\& the fact
that implementations of the \f2qsort()\fP functions are
non-reentrant), they become crucial when these libraries are made
accessible to an extension language.
As the importance of extension languages is growing, it is essential
that future library interfaces will be designed with the particular
requirements of extensions languages in mind.
.PP
.\" automatic generation of interfaces / foreign functions
.P
Conclusions
.PP
Since the Elk project was started, both the research community and
significant industry projects have generated increasing numbers of
``embeddable language'' implementations.
While many such languages inherit the syntactic flavor of BASIC, those
projects that focus on the ability to build non-trivial extensions
recently seem to almost exclusively turn to the Scheme language.
.PP
Scheme has proven to be an effective language for extension language
purposes.
In the beginning of the ISOTEXT project, there were concerns that an
implementation of the full Scheme language would be both too large and
too slow.
These reservations proved to be unfounded:
the code of Elk has less than half the size of medium size
applications such as \f2vi\fP.
While the performance of Elk may be uninspiring (no compiler is
available), this has proven not to be a critical issue, as any
bottlenecks can easily be replaced by a primitive recoded in C or C++.
.PP
While Elk has been used in the ISOTEXT project since 1987, legal
issues prevented making it publicly available until the fall of 1989.
Since, Elk has gained acceptance, in fact sufficient momentum to
encourage others to contribute software.
We know of projects that make use of Elk at research institutions such
as ... as well as in industry work at Autodesk, Computervision,
HP, IBM, Intel, NEC, and SNI.
.P
Availability
.PP
Elk is available in legally unencumbered status.
The current version as of September 1992 is 1.5.
The newest version of Elk is available via anonymous FTP from
export.lcs.mit.edu (/contrib) and ftp.cs.tu-berlin.de (/pub).
.NH
References
.IP "[Abelson et al. 1985]
Harold Abelson and Gerald J. Sussman with Julie Sussman,
\f2Structure and Interpretation of Computer Programs\fP,
MIT Press, Cambridge, Mass., 1985.
.\"
.IP "[Bormann et. al. 1988]
Ute Bormann, Carsten Bormann, C. Bathe,
\f2SDE \- A WYSIWYG Editing and Formatting System for ODA and
SGML Documents\fP,
ESPRIT '88, Proceedings of the 5th Annual ESPRIT Conference,
Brussels, November 14-17, 1988.
.\"
.IP "[Bormann 1991]"
Carsten Bormann,
\f2Open Document Processing and the ISOTEXT System\fP,
Doctoral Dissertation, TU-Berlin, 1991.
.\"
.IP "[CFI 1991a]"
CAD Framework Initiative, CFI Extension Language Sub-Committee,
\f2CFI Extension Language Selection Document\fP,
CFI Document Number 87, CAD Framework Initiative Inc., Austin, Texas, 1991.
.\"
.IP "[CFI 1991b]"
CAD Framework Initiative, Extension Language Working Group:
Architecture Technical Sub-Committee,
\f2Extension Language: Core Language Selection\fP,
Draft Proposal Version 0.7, CFI Document Number ARCH-91-G-1,
CAD Framework Initiative Inc., Austin, Texas, 1991.
.\"
.IP "[Clinger et al. 1991]"
William Clinger and Jonathan Rees (Editors),
\f2Revised\u\s-1\|4\s0\d Report on the Algorithmic Language Scheme\fP,
November 2, 1991.
Available per FTP from altdorf.ai.mit.edu.
.\"
.IP "[CLX 1991]"
\f2CLX \- Common LISP X Interface\fP, 1991.
(Part of the X11 Release 5 distribution available from the
MIT software distribution center.)
.\"
.IP "[Dybvig 1987]"
R. Kent Dybvig,
\f2The Scheme Programming Language\fP,
Prentice Hall, Englewood Cliffs, NJ, 1987.
.\"
.IP "[Hansen 1990]"
Wilfred J. Hansen,
\f2Enhancing documents with embedded programs: How Ness extends
insets in the Andrew ToolKit\fP,
Proceedings of IEEE Computer Society 1990 International
Conference on Computer Languages,
March 12-15, 1990, New Orleans.
.\"
.IP "[IEEE\|Std\|1178-1990]"
\f2IEEE Standard for the Scheme Programming Language\fP,
New York, May 28, 1991.
.\"
.IP "[Joy 1980]"
Bill Joy,
\f2Changes in the VAX system in the Fourth Berkeley Distribution\fP,
Computer Systems Research Group,
University of California, Berkeley,
November 1980.
.\"
.IP "[Lewis et al. 1990]"
Bil Lewis, Dan LaLiberte, the GNU Manual Group,
\f2GNU Emacs Lisp Reference Manual\fP,
Edition 1.03, Free Software Foundation, Cambridge, Mass.,
December 1990.
.\"
.IP "[MIT 1984]"
\f2MIT Scheme Manual, Seventh Edition\fP,
Department of Electrical Engineering and Computer Science,
Massachusetts Institute of Technology,
Cambridge, Mass., September 1984.
.\"
.IP "[Ossanna 1979]"
J. F. Ossanna,
\f2Nroff/Troff User's Manual\fP,
UNIX Programmer's Manual, Seventh Edition, vol. 2,
Bell Telephone Laboratories, Murray Hill, NJ, January 1979.
.\"
.IP "[Ousterhout 1990]"
John K. Ousterhout,
\f2Tcl: An Embeddable Command Language\fP,
Proceedings of the USENIX 1990 Winter Conference,
January 1990, pp. 133-146.
.\"
.IP "[Scheifler et al. 1986]"
Robert W. Scheifler and Jim Gettys,
\f2The X Window System\fP,
ACM Transactions on Graphics, vol. 5, no. 2, pp. 79-109, 1986.
.\"
.IP "[Springer et al. 1989]"
George Springer and Daniel O. Friedman,
\f2Scheme and the Art of Programming\fP,
MIT Press, Cambridge, Mass., 1989.
.\"
.IP "[Stallman 1981]"
Richard M. Stallman,
\f2EMACS \- The Extensible, Customizable, Self-documenting Display
Editor Production System\fP,
SIGPLAN Notices, vol. 16, no. 6, pp. 147-156, Association for
Computing Machinery, New York, 1981.
.\"
.IP "[Yip 1991]"
G. May Yip,
\f2Incremental, Generational Mostly-Copying Garbage Collection in
Uncooperative Environments\fP,
WRL Research Report 91/8,
DEC Western Research Laboratory,
Palo Alto, California, 1991.
.bp
.SH
Appendix A: Extending Elk \- An Example
.P
The ``ndbm'' Library Extension
.PP
The extensibility mechanisms of \*E can be demonstrated best by
a simple library extension.
Consider the \f2ndbm\fP library that is available on most versions
of UNIX.
This library implements functions to maintain a simple database
file of key/contents pairs.
.PP
As shown in Listing @L(ndbm), both the keys and the data to be stored
are described by the type \f2datum\fP; it consists of the data
(a string of bytes) and the length of the data.
\f2dbm_open()\fP opens a database file and returns a handle
to that file to be used in subsequent operations on that
database (a pointer to an opaque data type, similar to the \f2fopen\fP
and \f2readdir\fP interfaces); it returns a null pointer if the
file could not be opened.
A database is closed by a call to \f2dbm_close()\fP.
The data stored under a given key is accessed by the function
\f2dbm_fetch()\fP; it returns an object of type \f2datum\fP
(with a null \f2dptr\fP if the key could not be found).
\f2dbm_store\fP is used to insert an entry into a database and
to modify an existing entry; it returns zero on success and a
non-zero value on error.
.Ls
.Ss
#include <ndbm.h>
.Sl
.Sl
typedef struct {
char *dptr;
int dsize;
} datum;
.Sl
.Sl
DBM *dbm_open(char *file, int flags, int mode);
.Sl
void dbm_close(DBM *db);
.Sl
datum dbm_fetch(DBM *db, datum key);
.Sl
int dbm_store(DBM *db, datum key, datum data, int flags);
.Se
.Lc "The \f2ndbm\fP library"
.Ns
\f2Note:\fP For simplicity, several functions have been omitted.
.br
The \f2flags\fP and \f2mode\fP arguments of \f2dbm_open\fP are that
of the \f2open\fP system call.
The \f2flags\fP argument of \f2dbm_store\fP can be DBM_INSERT to
insert a new entry into the database or DBM_REPLACE to change
an existing entry.
.Ne
.Le ndbm
.PP
The straightforward way to write an \f2ndbm\fP extension to \*E is to
provide a new Scheme data type \f2dbm-file\fP together with the
obligatory type predicate \f2dbm-file?\fP and the Scheme primitive
procedures \f2dbm-open\fP, \f2dbm-close\fP, \f2dbm-fetch\fP and
\f2dbm-store\fP that operate on objects of type \f2dbm-file\fP.
.PP
\f2dbm-open\fP receives the filename (a string or a symbol);
the second argument is one of the symbols \f2reader\fP (open
the file read-only), \f2writer\fP (read and write access), and
\f2create\fP (read and write access, create new file if it does
not exist).
The optional filemode argument is an integer.
\f2dbm-open\fP returns an object of type \f2dbm-file\fP or #f
(false) if the file could not be opened.
\f2dbm-close\fP closes the database file associated with its
argument of type \f2dbm-file\fP.
It returns a non-printing object.
.PP
\f2dbm-fetch\fP expects a \f2dbm-file\fP and a string argument
(the key to be searched) and returns a string (the data stored
under the key) or #f if the key does not exist.
Note that in \*E strings may contain arbitrary 8-bit characters,
including the null byte.
\f2dbm-store\fP is called with a \f2dbm-file\fP, two strings
(key and data) and one of the symbols \f2insert\fP and \f2replace\fP.
Its integer return value is the return value of \f2dbm_store()\fP.
.PP
These procedures and the new \f2dbm-file\fP type can be used
by application programmers to manipulate database files
in those parts of their applications that are written in Scheme.
Listing @L(ndbm-example) shows a small example.
.Ls
.Ss
(define expand-mail-alias
(lambda (alias)
(let ((d (dbm-open "/etc/aliases" 'reader)))
(if (not d)
(error 'expand-mail-alias "cannot open database"))
(unwind-protect
(dbm-fetch d alias)
(dbm-close d)))))
.Sl
(define address-of-staff (expand-mail-alias "staff"))
.Se
.Lc "Using the ndbm extension"
.Ns
\f2Note:\fP The \f2unwind-protect\fP and the \f2error\fP form
are not present in standard Scheme.
.Ne
.Le ndbm-example
.P
The Anatomy of a Scheme Type
.PP
Listing @L(ndbm-skeleton) shows the part of the extension that deals with
the new data type \f2dbm-file\fP and the extension initialization
function.
The variable \f2T_Dbm\fP will hold the unique identifier of the
newly defined type.
The structure \f2S_Dbm\fP defines the C representation of the type;
one such C structure is declared for each composite Scheme type.
Its main component is the handle of the database file that is
contained in each object of type \f2dbm-file\fP.
.Ls
.Ss
#include <scheme.h>
#include <ndbm.h>
.Sl
int T_Dbm;
.Sl
struct S_Dbm {
DBM *dbm;
char alive; /* 0 or 1 */
};
.Sl
#define DBMF(obj) ((struct S_Dbm *)POINTER(obj))
.Sl
int Dbm_Equal(a, b) Object a, b; {
return DBMF(a)->alive && DBMF(b)->alive && DBMF(a)->dbm == DBMF(b)->dbm;
}
.Sl
void Dbm_Print(d, port) Object d, port; {
Printf(port, "#[dbm-file %lu]", DBMF(d)->dbm);
}
.Sl
Object P_Is_Dbm(x) Object x; {
return TYPE(x) == T_Dbm ? True : False;
}
.Sl
void init_dbm() {
Define_Primitive(P_Is_Dbm, "dbm-file?", 1, 1, EVAL);
Define_Primitive(P_Dbm_Open, "dbm-open", 2, 3, VARARGS);
Define_Primitive(P_Dbm_Close, "dbm-close", 1, 1, EVAL);
Define_Primitive(P_Dbm_Store, "dbm-store", 4, 4, EVAL);
Define_Primitive(P_Dbm_Fetch, "dbm-fetch", 2, 2, EVAL);
.Sl
T_Dbm = Define_Type("dbm-file", sizeof(struct S_Dbm),
Dbm_Equal, Dbm_Equal, Dbm_Print, NOFUNC);
}
.Se
.Lc "Skeleton of the ndbm extension"
.Ns
\f2Note:\fP For simplicity some details have been omitted in this
listing, and the calling interface of some functions has been
simplified; the program would not compile in this form.
A working \f2gdbm\fP (GNU dbm) extension is included in the \*E
distribution.
.Ne
.Le ndbm-skeleton
.PP
Scheme objects can usually live longer than their underlying
C objects.
In case of the \f2dbm-file\fP type, a Scheme object of that type
can apparently still be used after its database handle has been
closed by a call to \f2dbm-close\fP.
As \*E extensions must not crash the application, we must prevent
such stale objects from being used in further calls to
\f2dbm-fetch\fP, \f2dbm-store\fP, and \f2dbm-close\fP.
One way to achieve this is to record in each Scheme object whether
the underlying C object is still alive or has been terminated.
The boolean component \f2alive\fP in the \f2dbm-file\fP type
serves this purpose.
It is initialized with true and is set to false in \f2dbm-close\fP.
Further operations on objects with \f2alive\fP being false are
rejected.
.PP
The interpreter stores all Scheme objects in variables of type
\f2Object\fP.
An \f2Object\fP is typically a 32-bit value; it is composed of
a \f2tag\fP part and a \f2pointer\fP part.
The \f2tag\fP part indicates the type of the object, and the remaining
bits hold the actual memory address of the object (they point into
the interpreter's heap).
The macros \f2TYPE\fP and \f2POINTER\fP are provided to extract
the fields of an \f2Object\fP.
Each type definition must define a macro to extract the object's
memory address from an \f2Object\fP (by means of \f2POINTER\fP)
and then cast it into a pointer to the underlying C structure
(see \f2#define DBMF\fP in Listing @L(ndbm-skeleton)).
.PP
\f2Dbm_Equal()\fP implements both the \f2eqv?\fP and the \f2equal?\fP
predicates for \f2dbm-file\fP objects; it returns true if both objects
being compared are alive and contain identical \f2DBM\fP handles.
.PP
\f2Dbm_Print()\fP is called by the interpreter each time an object
of type \f2dbm-file\fP is to be printed; it is invoked with the
object and the Scheme port to which the output is to be sent.
.PP
\f2P_Is_Dbm()\fP implements the primitive procedure \f2dbm-file?\fP
(the type predicate).
As with all primitives, it receives arguments of type \f2Object\fP and
returns an \f2Object\fP, and it has a name beginning with ``P_''.
.PP
The definition of the initialization function \f2init_dbm()\fP
is straightforward; it invokes \f2Define_Primitive()\fP once
for each primitive procedure and finally \f2Define_Type()\fP to
make the new type known to the interpreter.
.P
Primitive Procedures \- The Details
.PP
Listing @L(dbm-open) gives the definitions of the primitives \f2dbm-open\fP
and \f2dbm-close\fP.
.Ls
.Ss
static SYMDESCR Flag_Syms[] = {
{ "reader", O_RDONLY },
{ "writer", O_RDWR },
{ "create", O_RDWR|O_CREAT },
{ 0, 0 }
};
.Sl
Object P_Dbm_Open(argc, argv) int argc; Object *argv; {
char *p;
DBM *dp;
Object d;
.Sl
Make_C_String(argv[0], p);
dp = dbm_open(p, Symbols_To_Bits(argv[1], 0, Flag_Syms),
argc == 3 ? Get_Integer(argv[2]) : 0644);
if (dp == 0)
return False;
d = Alloc_Object(sizeof(struct S_Dbm), T_Dbm, 0);
DBMF(d)->dbm = dp;
DBMF(d)->alive = 1;
return d;
}
.Sl
void Check_Dbm(d) Object d; {
Check_Type(d, T_Dbm);
if (!DBMF(d)->alive)
Primitive_Error("invalid dbm-file: ~s", d);
}
.Sl
Object P_Dbm_Close(d) Object d; {
Check_Dbm(d);
DBMF(d)->alive = 0;
dbm_close(DBMF(d)->dbm);
return Void;
}
.Se
.Lc "ndbm extension \- implementation of \f2dbm-open\fP and \f2dbm-close\fP
.Le dbm-open
.PP
\f2dbm-open\fP, as it has an optional argument, is a function
with \f2VARARGS\fP calling discipline (not to be confused with the
C language feature of the same name).
This is indicated by the last argument to the corresponding call
to \f2Define_Primitive\fP in the extension initialization.
Primitives of this type receive an array of \f2Objects\fP and
a count.
.PP
The initial call to the macro \f2Make_C_String\fP checks if the
first argument to \f2dbm-open\fP is a string (or a symbol) and
converts it to a C string.
To obtain the second argument to \f2dbm_open()\fP, the symbol
passed to the Scheme primitive (\f2reader\fP, \f2writer\fP, etc.)
has to be mapped to a corresponding flags combination (\f2O_RDONLY\fP,
\f2O_RDWR\fP, etc.).
This is accomplished by the function \f2Symbols_To_Bits()\fP; it
is invoked with a Scheme symbol, a flag indicating whether a single
symbol or a list of symbols (a mask) is to be converted, and a
table of pairs of symbol names and C integers.
The third argument to \f2dbm_open\fP is the filemode;
\f2Get_Integer()\fP converts a Scheme number to a C integer.
\f2dbm-open\fP finally allocates a new Scheme object of type \f2T_Dbm\fP
on the heap, initializes the components of the object, and returns it.
.PP
The auxiliary function \f2Check_Dbm()\fP is used by the remaining
primitives to check whether a given object is of type \f2dbm-file\fP
and if so, whether it is stale.
In this case an error is signaled; \f2Primitive_Error()\fP enters
the error handler of \*E.
.PP
\f2P_Dbm_Close()\fP just marks the object as stale by setting
\f2alive\fP to false and closes the database file.
.PP
Listing @L(dbm-store) shows the implementation of \f2dbm-store\fP and
\f2dbm-fetch\fP.
\f2Make_Integer()\fP is the pendant to \f2Get_Integer()\fP;
it converts a C integer into a Scheme number.
Likewise, \f2Make_String()\fP converts a C string into a
Scheme string.
.bp
.Ls
.Ss
static SYMDESCR Store_Syms[] = {
{ "insert", DBM_INSERT },
{ "replace", DBM_REPLACE },
{ 0, 0 }
};
.Sl
Object P_Dbm_Store(d, key, content, flag) Object d, key, content, flag; {
datum k, c;
int result;
.Sl
Check_Dbm(d);
Check_Type(key, T_String);
Check_Type(content, T_String);
k.dptr = STRING(key)->data; k.dsize = STRING(key)->size;
c.dptr = STRING(content)->data; c.dsize = STRING(content)->size;
result = dbm_store(DBMF(d)->dbm, k, c,
Symbols_To_Bits(flag, 0, Store_Syms));
return Make_Integer(result);
}
.Sl
Object P_Dbm_Fetch(d, key) Object d, key; {
datum k, c;
.Sl
Check_Dbm(d);
Check_Type(key, T_String);
k.dptr = STRING(key)->data; k.dsize = STRING(key)->size;
c = dbm_fetch(DBMF(d)->dbm, k);
return c.dptr ? Make_String(c.dptr, c.dsize) : False;
}
.Se
.Lc "ndbm extension \- implementation of \f2dbm-store\fP and \f2dbm-fetch\fP
.Le dbm-store
.\" ----------
.\" Erwaehnen: Little Languages
.\" ----------
.\" Read the CFI papers again